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1 Methods and application to operating systems 

OO The classic UNIX code for switching processes is famously opaque and con- 

^ cise. In Version 6 of UNIX[5], Dennis Ritchie appended a half-hearted ex- 

i — | planation and then added a wry: "You are not expected to understand this". 

Such complex state changes are at the heart of OS design. In this note, I 
will specify what the code does and, I hope, illustrate methods that will 
be of reasonably general utility in understanding and designing complex 
computer and software systems. 

i-H The code itself looks something like this. 

> 

0\ switch ( process_t *next){ 

^ 1 if(save()){ 

(N| 2 resume ( next ) ; 

3 panic ( "returned^from^resume" ) ; 

4 } else fixmmu ( ) ; //switching in 
q 7 return; 

"Save" saves the process state of the current (running) process and re- 
^ turns "1" so that the running process then calls "resume" with a pointer 

to the saved process state of a second process — "next". The "resume" 
subroutine restores the state of the process identified by "next" and returns 
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"0" as if it were returning from save. The newly restored process then 
falls through to the code section marked "switching in". The saved process 
does not start running again until some other process calls "resume" with 
a pointer to its data structure. 

Process p\ Process P2 Process pk 

save pi state 
return 1 to "if" 
call resume{p2) 

return to "if" 
enter else 
f ixmmu 
return 

switchout 

save pk state 
call resume{pi) 

return to "if" 

There is no time limit between a process saving and resuming and the 
system can get up to any number of things in-between the two operations 
- even suspending itself. 

When process p\ calls "switch" with next = P2, then p\ will not get 
to "return" until P2 has resumed operation and then, in some future state, 
some other process pk where pk may or may not be the same as P2 calls 
switch with next = p\. One way to express this property is to say that any 
path z that starts in process p\ at the start of "switch" and that terminates 
in process p\ at the return from "switch" must be factorable into subpaths 
that visit a series of intermediate states - as shown in this diagram. 



The diagram is suggestive, but it would be nice to be able to write down 
exactly what it means and then see how other properties interact with this 
property or what else we have to know about the system to assure this 
property. 

Let w be the "current event sequence" — the path that leads from the 
initial state of the system to the current state 1 . Let A be the empty sequence, 
wa be the sequence obtained by appending event a to sequence w, and wz 
the sequence obtained by appending sequence z to sequence w. Then A leads 
to the initial state. Appending an event drives the system to a successor 
state from the current state. Appending a sequence of events drives the 
system to a future state. Recursive relations are sufficient to define many 
event sequence dependent variables. Here's a trivial one that just counts all 
events. 

Count(X) = and Count(wa) = Count(w) + 1 

In what follows, I will assume the existence of a collection of sequence 
dependent variables and functions that provide a window into state. Defi- 
nitions of some of those functions from simpler state variables are given in 
section 3. 

Suppose we have functions Clint and CName so that Cline(p,w) and 
CName(p.w) are, respectively, the current line number and current function 
name of process p in the source code listing (assumed here to be in "C"). 
Then a "debugger" view of system state is given by: 

Loc(io,p) = (CFunc(w,p), Cline(w,p)) 

For every program variable x, we let V(w,p, x) be the current value of 
x in the context of process p. For example, when p is inside "switch" the 
value of V(w,p,next) is the process identifier of the target of the switch. 
Note that f(wz, g(w)) evaluates g in the state reached by w and evaluates 
/ in the state reached by w • u. So Loc(to • u, V(w,p, next)) is the location 
of process p' = V(w,p, next) in the state determined by w • u. 

Proposition 1.1. IfLoc(w,pi) = (switch, 0) and V(w,p\, next) =P2 ^ Pi 
then for any z so that Loc(io »z,pi) = (switch, 7) there must be a process 

1 There is a common theory that we have to pretend computational objects are "non- 
deterministic", but that seems to be based on mistaking methodological limitations for 
fundamental properties. 
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Pk 7^ Pi and sequences {u\ • U2 • «3 • u^) 



z so that: 



Loc(to • Ui,p) = (switch, 2) 
Loc(to • Ui • ti2,P2) = {switch, 4) 
Loc(to • tii • U2 • Uz,Pk) = {switch, 0) 
V(to • «i • «2 • U3, pk, next) = pi 



(1) 
(2) 
(3) 
(4) 



Let's suppose we have a boolean function "Running" so that Running{w,p) = 
1 if and only if p is active. 

• Running{w,p) £ {0, 1}; 

• Running(w,p) > if and only if p is running in the state determined 
by w; 

• Running (X,p) > if and only if p is running in the initial state of the 



• Running(w,p) > Running{wa,p) if and only if event a causes p to 
stop running if the system is in the state determined by w; 

• there is a prefix u of z so that Running{w • u,p) if and only if p is 
"sometimes" running during z after the state determined by w. 

By using event sequences we get an active view of how variables change 
and it is easy to define variables that help reveal the workings of a system. 
Here's one that counts the number of times a process has "switched in". 



One of the advantages of the methods used here is that we are not 
forced to either enumerate the state set or even explain too much about 
the alphabet of events. For something like an OS, the event alphabet is 
going to be large and complex and the state set will be worse. Perhaps the 
event alphabet will consist of "samples" of the inputs applied to the chips of 
the motherboard at each processor cycle. We could imagine these events as 
digitized snapshots of signals. Each snapshot then indicates some discrete 
interval of time has passed. There may also be events that correspond to 



system; 



In{\,p) = 0, 




If Running{w,p) < Running(wa,p) 
otherwise 
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logical changes. But, for now, we can just specify the information we need 
to be able to decode from the event stream. 

Let's require that line numbers and source code functions only change 
when a process is active. 

Loc(wa,p) 7^ Loc(tu,p) => Running(w,p) 

Note that Running(w,p) = 1 may not mean Running(w,p') = be- 
cause we leave open the possibility of multiple processor cores. More on 
that below. 

Note that V(wa,p, x) ^ V(w,p, x) does not necessarily imply that Running(w,p) = 

1 — because many of the objects within the address space of a process are 
shared objects. For example the pages may page in or out, data may arrive 
from a DMA device, there may be notification of an I/O or other event, and 
shared data structures will be modified by other processes. Modularity in 
operating systems is a tough engineering challenge. 

2 Instrumenting the OS 

Proposition 1.1 is a "safety" property — it requires that if there is a path 
from entry to exit, the path must have certain properties. We also need a 
liveness property — that processes will advance from switch to the running 
of the target process. 

If each event defines signals over a specified unit of time, then we can have 
Time(w) provide the current time in some sufficiently fine unit. Without 
going into to much detail, Time needs to behave sensibly: 

Time(w) < Time(w • u) 

We will often need to count how much time passes during an event or se- 
quence of events 

Time(w • u) - Time(w) 
tells us how much time passes during "u" after "w" and 

Time(wa) - Time(w) 

measures the time during the single event a. It may be that there are 
events that take no real-time or maybe each event corresponds to a sample 
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of signals during a discrete interval or even that event duration depends on 
history. We don't have to worry about any of that yet. 

Let's also suppose we have ValidProcess(w,p) to tell us whether a pro- 
cess identifier p identifies an actual, instantiated process (on any core) and 
we have Ready(w,p) £ {0, 1} to tell us is a process is ready to run. 



We can now define how long a process has been waiting to run. 

Waiting (X,p) = 
Waiting (w a, p) 

_ | (Time(wa) - Time(w)) + Waiting (w,p) if Running (w, p) < Ready(w,p) 



A a system is tu ve live if Waiting (w,p) < tu ve for all w. Although some 
researchers have decided that "liveness" should be considered a property "in 
the limit" (without an explicit time bound), I don't think such a version 
of liveness means anything interesting when we are discussing engineered 
discrete state objects. 

Proposition 2.1. Calling switch forces process "next" to run within a 
fixed time. 

There is a t sw it c h so that for any w and z: 

If Loc(w,p) = (switch, 0) and Time(w • z)> Time(w) +t sw itch 

then there is a prefix u of z so that Loc(w • u, V(w,p, next)) = (switch, 5) 

Proposition 2.1 has to be true if the system is tu ve live. Otherwise, the 
switching out process could stall, forever. 

The two propositions formalize what we want the switch code to do at 
a high level, but do not specify how state must be preserved over a switch. 
Since process state consists of both shared and non-shared data, we have to 
distinguish those: 



ValidProcess(w,p) £ {0, 1} 
Ready (w,p) £ {0, 1} 
Running(w,p) < ValidProcess(w,p) 
Ready(w,p) < ValidProcess(w,p) 




otherwise 
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Proposition 2.2. 

If Loc(w,p) = (switch, 3) and Loc(io »u,p) = (switch, 5) 
and there is no proper prefix z of u so that Loc(to • z,p) = (switch, 5) 
then for any non-shared variable x, V(w,p, x) = V(w • u,p, x) 

3 Digging down 

Here's a list of functions "assumed" into existence above that need to be 
either justified or defined from simpler elements. 

Cline 
Cname 

SavedRegisters 

StackContents 

ValidProcess 

Ready 

Running 

Time 

Nonshared 
V 

Let's suppose that the machine has 1 or more cores and that 

Reg(w, c, r), Mem(t«, c, loc) 

are, respectively the contents of register r on core c and the contents of mem- 
ory location loc on core c. For example Reg(w, c, PC) (program counter) 
and Reg(iu, c, SP) (stack pointer) are useful to know. Given a program list- 
ing L and the current program counter, it is reasonably straightforward to 
compute Chine and CName, so I won't dig into those further. Given these 
values, whether a symbol is a stack or global variable is also straightforward, 
so we assume IsStack and IsGlobal can be constructed. Furthermore, for 
global variables the correspondence between name and address is determined 
by the program listing and some data about the compiler/linker settings. 
Suppose there is a memory location current[c] for each core c that holds 
the identity of the current process on core c. Then Mem(yy, c, current [c]) is 
the process running on core c. We have to require that 

Mem(uy, c, current[c\) = Mem(t«, c', current[c']) <-> c = c' 



7 



and then 



Running (w,p) 



1 if for some c, Mem(w, c, current[c\) = p 
otherwise. 

1 if for any c, Bitset(Mem(w , p + procstatus), READY) 
Ready (w,p) <( and ValidProcess(w , p) 
otherwise. 
Mem(u;a, c, y) ifMem(t«, c, current[c\) = p 

and IsGlobal(w,p, x) and y = x 
and IsStack(w,p.x) and ?/ = x + Reg(to, c, SP) 
V(w,p, x) otherwise. 



V(wa,p, x) 



If Reg(w, c, SP) is the contents of the stack pointer register on core 
c, then Mem(w, c, Reg(t«, c, SP)) is the contents of the top of the stack 
on processor core c (assuming alignment and so on). In many operating 
systems, the kernel stack of a process, which is what we are discussing here, 
is fixed size and "grows down" by subtraction from a, for example, 8K 
boundary. One of the reasons for doing this is that its easy to calculate the 
stack base by bitwiseand(stackaddress + 8095, bitinvert(8095)) if the stack 
is 8K and on an 8K boundary. In that case, we can define StackContents so 
it captures the stack. 



StackContents(wa,p) 

(Mem(w,c,a)...Mem(w,c,b)) if Running (w,p) 

and Mem(w, c, current[c]) = p 
and a = Reg(t«, c, SP) 
"' and b = bitiseand(a + 8195), bitinvert(8195)) 

and increments between a and b are by wordsize 
StackSize(w,p) otherwise 

Note that StackContents is defined so that it does not change when the 
process is not running. If we dig down to the assembler level, we'd probably 
want to be sure that the stack contents at the point of return from save was 
the same as that at the point of return from resume. 
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Cline 
Cname 

SavedRegisters 

StackContents 

ValidProcess 

Ready 

Running 

Time 

Nonshared 
V 



from Reg(io, c, PC) 
from Reg(tu, c, PC) 
from Reg(to, c, ..) 

from Reg(to, c, SP) and Mem(to, c, ..) 
from Mem(to, c,p— > status) 
from Mem(to, c,p- > status) 
from Reg(w, c, current) 
primitive 

from symbol table 
from Mem(to, c, ...) 



4 Parallelism and encapsulation 

Parallelism is a huge issue in "formal methods" but appears naturally here. 
For example, it is certainly possible that for some w and a there are several 
cores c so that Reg(toa, c, PC) ^ Reg(w, c, PC). We have not had to yet 
specify anything about the way the cores change state in parallel — they 
just are specified in a way that makes it possible. In some cases, however, 
we want to describe systems in which the architecture of components is 
specified and that is also straightforward. 

Consider an abstract model of process interaction where processes can 
either wait for or generate events and, only one process can advance per core. 
We are going to want to connect up a collection of these processes so that 




g 



they communicate sychronously. ste P -^output-running 

Note that the diagram obscures the intent that there may be many different 

states where output is running, waiting, or sending. 

Definition 4.1. / is an abstract state process over P and X with id po 
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if and only if 



f{w) G {running, waiting[p], sending[x,p] : x £ X,p £ P} 
and /(to) 7^ running =^> f(w ■ {step)) = /(to) 
and /(to) = sending[x,p] =^> f (w ■ {send)) = idle 
and /(to) = waiting[p] =^> f(w ■ {receiv e[x,p])) = idle 
and /(to) 7^ waiting[po] — never wait for self 
and f(w) 7^ sending[x,po] — never sent to self 

Many distinct sequence dependent functions can satisfy this specifica- 
tion. That is, we can have A\ and A2 that are both abstract processes by 
this definition where -4i(to) 7^ A 2 {w) for some or even most to. An abstract 
process that is "running" has some internal procedure for deciding when 
to request to send or receive a message. We do not need, now, to decide 
what that process is, but it could easily be the execution of a program it 
receives as a message or something fixed in its internal operation or some 
combination. Finally, we have not specified what happens when unwanted 
events happen — such as a receive from p' when the process wants to receive 
from p. 

Now let's define a connected system of such abstract processes. Suppose 
that each of A P1 . . . A Pk are abstract processes and define 

F{w,p) = A p {uip) 

where we will define to p recursively. 

X p = X and (toa) p = w p • g(w, a,p) 

and 

{receive[x, q]) if A p (w p ) = waiting[q] 

and A q (w q ) = sending[x,p] 
{send) if A(w p ) = sending[x, q] 

g(w,a,p) = \ and A[w q ) = waiting[p] 

{step) if A[wp) = running 

and Running (to,p) 
A otherwise. 

Note that p only gets to "step" if it is selected as the running process in 
the encompassing environment of the operating system. 
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5 Conclusion and mathematical note 

In brief, sequence functions are representations of Moore type state ma- 
chines. Given a sequence function / over alphabet B let B* be the set of 
finite sequences over B including A and define 

w ~/ z Vu £ B*, f(w • u) = f(z • u) 

Then define [w]f = {z : z ftu} and consider the set of these equivalence 
classes Sf = {[to]/ : w £ B*}. Define Sf([w]f,a) = [wa]f and define 
Jf([w]f) = f{w). Then Mj = (B, Sf, [A]/,<5/,7/) is a classical (although 
not necessarily finite) Moore machine with state set Sf, initial state [A]/, 
transition map 8j, and output map 7/. 

Conversely, given a Moore machine M = (B,So,d,j) define /m so that 
/m(w) = / y{S*(w)) where <5*(A) = so and 8*(wa) = S(5*(w), a). 

The encapsulation of section 4 corresponds to a Moore machine pro- 
duce called the general product[2]. For simplicity let's define this prod- 
uct for finite numbers of state machines. Suppose / : B* x X — > Y 
where X = {xi,...Xh} is defined by f(w,x) = g(w x ) where X x = A 
and {wa) x = w x • p{f(w, x{) . . . , f(w, a:*.), a, x). For even more simplic- 
ity, suppose p(yi, . . .yk,a,Xi) £ B l . Then for each i we can construct a 
M gi = (5 t , so t , 5 t , ji) using the construction above. Define a product by 
Mf = (UiBi, (so ...So h ),8, 7). Each state of My is a fc-tuple s = (si, . . .Sk) £ 
UiSi. The transition function <5 is constructed as follows: 

6(s, a) = (<5i(si,p(7i(si) . . . j k (s k ), a, ii)), . . .5 k {s k , p(7i( s i) • ■ ■lk{s k ),a,x k ))). 

Finally: . . . s k )) = [ji{si), . . .%[s k )). Then f Mf (w) = (f(w,x 1 )...f(w,x k ). 

It may be seen why the functional representation is advantageous in some 
situations. 

Consideration of the algebraic basis of state machine theory and the re- 
lationship between state machines and semigroups indicates that there may 
be some value in looking at the algebraic structure of sequence dependent 
functions. If =/ is defined so that 

w= f u ^> Vz 1 ,z 2 ,f(z 1 •w »z 2 ) = f{z 1 »u »z 2 ) 

then the congruence classes [[w]]f = {u : w =/ w} form a monoid under the 
operation [[w]]f x [[u]]f = [[w = u]]f. If we constrain p to not depend on any 
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feedback, so that transitions to Mi depend only on outputs of Mj : j < i, 
then the results of Krohn-Rhodes theory as described in Holcombe [4], Arbib 
[1] and Ginzburg [3]. What happens if p is constrained in other ways, such 
as by a certain circuit design discipline? Also, in databases, using some 
circuit disciplines, and in other situations, invertibility is a useful property. 
That invertibility produces sequence functions that correspond to groups. 

A much earlier version of this work can be found in [9] and [8] and much 
earlier in [7] with applications in [6] and [10]. Unfortunately, it took me 
many years to understand good advice from Professor George Avrunin that 
the formal logic notation was an impediment instead of an advantage. 
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